SVM-OD: SVM Method to Detect Outliers

نویسندگان

  • Jiaqi Wang
  • Chengqi Zhang
  • Xindong Wu
  • Hongwei Qi
  • Jue Wang
چکیده

Outlier detection is an important task in data mining because outliers can be either useful knowledge or noise. Many statistical methods have been applied to detect outliers, but they usually assume a given distribution of data and it is difficult to deal with high dimensional data. The Statistical Learning Theory (SLT) established by Vapnik et aI. provides a new way to overcome these drawbacks. According to SLT Scholkopf et al. proposed a v-Support Vector Machine (v-SYM) and applied it to detect outliers. However, it is still difficult for data mining users to decide one key parameter in v-SYM. This paper proposes a new SYM method to detect outliers, SVM-OD, which can avoid this parameter. We provide the theoretical analysis based on SLT as well as experiments to verify the effectiveness of our method. Moreover, an experiment on synthetic data shows that SYM-OD can detect some local outliers near the cluster with some distribution while v-SYM cannot do that. IThis research is partly supported by the National Key Project for Basic Research in China (01998030508). 2 JiaqiWang,ChengqiZhang,XindongWu,HongweiQi, JueWang

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Density Based Support Vector Machines for Classification

Support Vector Machines (SVM) is the most successful algorithm for classification problems. SVM learns the decision boundary from two classes (for Binary Classification) of training points. However, sometimes there are some less meaningful samples amongst training points, which are corrupted by noises or misplaced in wrong side, called outliers. These outliers are affecting on margin and classi...

متن کامل

An adaptive error penalization method for training an efficient and generalized SVM

A novel training method has been proposed for increasing efficiency and generalization of support vector machine (SVM). The efficiency of SVM in classification is directly determined by the number of the support vectors used, which is often huge in the complicated classification problem in order to represent a highly convoluted separation hypersurface for better nonlinear classification. Howeve...

متن کامل

Increasing Efficiency of SVM by Adaptively Penalizing Outliers

In this paper, a novel training method is proposed to increase the classification efficiency of support vector machine (SVM). The efficiency of the SVM is determined by the number of support vectors, which is usually large for representing a highly convoluted separation hypersurface. We noted that the separation hypersurface is made unnecessarily over-convoluted around extreme outliers, which d...

متن کامل

Tracking of Object with SVM Regression

This paper presents a novel feature-matching based approach for rigid object tracking. The proposed method models the tracking problem as discovering the affine transforms of object images between frames according to the extracted feature correspondences. False feature matches (outliers) are automatically detected and removed with a new SVM regression technique, where outliers are iteratively i...

متن کامل

Debnath, Banerjee, Namboodiri: Adapting Ransac-svm to Detect Outliers for Robust Classification

Most visual classification tasks assume the authenticity of the label information. However, due to several reasons such as difficulty of annotation or inadvertently due to human error, the annotation can often be noisy. This results in wrongly annotated examples. In this paper, we consider the examples that are wrongly annotated to be outliers. The task of learning a robust inlier model in the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006